Temporal Alignment of Captured Inertial Measurement Unit Trajectory and 2D Video For Golf Swing Analysis

ABSTRACT

A method for temporal alignment of golf club inertial measurement data and two-dimensional video for golf club swing analysis is provided. The method includes capturing inertial measurement data of a golf club swing on an inertial measurement unit (IMU), and sending the inertial measurement data of the golf club swing from the inertial measurement unit to a computing device. The computing device is configured to calculate a time bias between the inertial measurement data of the golf club swing and a two-dimensional video of the golf club swing, based on the computing device detecting an impact frame in the two-dimensional video and the computing device aligning the detected impact frame to an impact time from the captured inertial measurement data of the golf club swing.

BACKGROUND

As an increasingly popular sport, golf has attracted millions of people around the world. Athletes and amateurs are always looking for ways to improve their skills. Sensor based golf coaching systems are commercially available. One such system provides an IMU (inertial measurement unit) sensor, denoted as M-Tracer™, on the golf club. The sensor tracks the golf club and outputs a high frequency swing trajectory as well as many other metrics such as impact speed, shaft angle etc. Although the sensor based golf coaching systems provide useful information, it is still difficult for a normal user to understand the information and link that information to his or her performance. It is within this context that the embodiments arise.

SUMMARY

In some embodiments, a method for temporal alignment of golf club inertial measurement data and two-dimensional video for golf club swing analysis is provided. The method includes capturing inertial measurement data of a golf club swing on a golf-club attached inertial measurement unit (IMU), and sending the inertial measurement data of the golf club swing from the golf club attached inertial measurement unit to a computing device. The computing device is configured to calculate a time bias between the inertial measurement data of the golf club swing and a two-dimensional video of the golf club swing, based on the computing device detecting an impact frame in the two-dimensional video and the computing device aligning the detected impact frame to an impact time from the captured inertial measurement data of the golf club swing.

In some embodiments, a method for temporal alignment of golf club inertial measurement data and two-dimensional video for golf club swing analysis, performed by a computing device is provided. The method includes receiving captured inertial measurement data of a golf club swing from an inertial measurement unit (IMU) attached to a golf club and receiving, from a device having a video camera, a two-dimensional video of the golf club swing. The method includes detecting an impact frame in the two-dimensional video, and calculating a time bias based on aligning the detected impact frame from the two-dimensional video and an impact time from the captured inertial measurement data of the golf club swing.

In some embodiments, a tangible, non-transitory, computer-readable media having instructions thereupon which, when executed by a processor, cause the processor to perform a method. The method includes receiving, from an inertial measurement unit (IMU), inertial measurement data of a golf club swing and receiving a two-dimensional video of the golf club swing. The method includes determining a location of an object in at least one frame in the two-dimensional video and determining an impact frame, by searching backwards from a later frame to an earlier frame in the two-dimensional video for a frame in which brightness at the determined location relative to the frame exceeds a threshold. The method includes aligning the detected impact frame from the two-dimensional video to an impact time from the inertial measurement data of the golf club swing and determining a time bias between the frames of the two-dimensional video and frames of the inertial measurement data of the golf club swing based on the aligning. The method includes overlaying a projected golf club swing trajectory onto the two-dimensional video of the golf club swing, based on temporal alignment, with the time bias, of the inertial measurement data of the golf club swing and the two-dimensional video.

Other aspects and advantages of the embodiments will become apparent from the following detailed description taken in conjunction with the accompanying drawings which illustrate, by way of example, the principles of the described embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The described embodiments and the advantages thereof may best be understood by reference to the following description taken in conjunction with the accompanying drawings. These drawings in no way limit any changes in form and detail that may be made to the described embodiments by one skilled in the art without departing from the spirit and scope of the described embodiments.

FIG. 1 depicts an inertial measurement unit (IMU) captured golf swing trajectory overlaid on a video, in a golf coaching system in accordance with some embodiments.

FIG. 2 is a flow diagram of a method for overlaying an IMU captured golf swing trajectory onto a two-dimensional video in accordance with some embodiments.

FIG. 3A depicts a coordinate system for an IMU, relative to gravity and a ball running direction in accordance with some embodiments.

FIG. 3B is a front view of a cell phone camera coordinate system in accordance with some embodiments.

FIG. 3C is a front view of a cell phone IMU coordinate system in accordance with some embodiments.

FIG. 4 is a flow diagram of a method to align z-axis of IMU coordinate space and camera coordinate space in accordance with some embodiments.

FIG. 5 is a flow diagram of a method to estimate translation between IMU coordinate space and camera coordinate space in accordance with some embodiments.

FIG. 6 depicts positioning a golf club straight up and down for a distance estimation from a frontal camera view in accordance with some embodiments.

FIG. 7 is a flow diagram of a method to align x and y axes of IMU coordinate space and camera coordinate space in accordance with some embodiments.

FIG. 8 is a flow diagram of a method for user assisted mode in accordance with some embodiments.

FIGS. 9A and 9B show examples of user assisted mode: clicking on a location of the golf ball (left) in a video frame and clicking on a location of the IMU (right) in a video frame in accordance with some embodiments.

FIG. 10 depicts a marker coordinate system in accordance with some embodiments.

FIG. 11 shows measurements of a ball location in marker coordinate space in accordance with some embodiments.

FIGS. 12A and 12B show symmetry of intensity (left) and symmetry of gradient direction (right) in images of golf balls in accordance with some embodiments.

FIGS. 13A and 13B depict using a distance transform to find the center of the ball in accordance with some embodiments.

FIG. 14 depicts fitting a semicircle to the ball in accordance with some embodiments.

FIGS. 15A and 15B depict intensity profiles of lines through images of golf balls in accordance with some embodiments.

FIG. 16 is a flow diagram of a method for determining a time bias between a video frame and an inertial measurement unit sample, for synchronizing in accordance with some embodiments.

FIGS. 17A-C show an example of false impact frame detection in accordance with some embodiments.

FIG. 18 depicts a temporal pattern of intensity at the ball location, in video frames in accordance with some embodiments.

FIG. 19 depicts temporal alignment by changing the time offset of the inertial measurement unit samples or frames relative to video frames in accordance with some embodiments.

FIG. 20 is a block diagram of a golf coaching system in accordance with the present disclosure in accordance with some embodiments.

FIG. 21 is a flow diagram of a method for temporal alignment of an inertial measurement unit captured golf club swing and a video of the golf club swing in accordance with some embodiments.

FIG. 22 is an illustration showing an exemplary computing device which may implement the embodiments described herein.

DETAILED DESCRIPTION

A golf coaching system for golf swing analysis performs spatial and temporal alignment of an inertial measurement unit (IMU) captured golf swing and a two-dimensional video of the golf swing, using an apparatus and various methods described herein. The methods can be performed on one or more processors, such as a processor of an IMU, a processor of a computing device and/or a processor of a mobile device (which could also be a computing device). One device that is suitable for performing portions of various methods and serving as a portion of a suitable apparatus is the M-Tracer™ of the assignee, which is an IMU that can be mounted to a golf club. The M-Tracer™ is equipped with wireless communication, and can send IMU data to another wireless device. Although embodiments are described herein using the M-Tracer™ as an IMU in one embodiment, it should be appreciated that variations and further embodiments are readily devised using other IMU systems, as the embodiments are not limited to the M-Tracer™ product.

FIG. 1 depicts an inertial measurement unit (IMU) captured golf swing trajectory overlaid on a video, in a golf coaching system. The system allows a user to directly see a visualization of the trajectory on top of his golf swing video as captured by a mobile device such as a smart phone or a tablet as shown in FIG. 1. A projected golf club swing trajectory 101 is overlaid onto a two-dimensional video 103 of the golf club swing, and displayed on a view screen of a mobile device, in this example. Since the M-Tracer™ system, or any other suitable IMU system, and the camera system have different coordinate system, a method is needed to spatially align these coordinate systems. In this disclosure a method to automatically calibrate the IMU system and the camera system, spatially and temporally, is described. After calibration, the capture trajectory can be overlaid on top of the video.

With reference to FIGS. 1-15 and 20, a method and apparatus to spatially align the IMU captured golf swing trajectory with 2D (two-dimensional) video captured by a static mobile device such as a smart phone or tablet are disclosed. After alignment, the user is able to visualize his or her swing trajectory on top of the 2D video. The method can automatically estimate the rotation and translation between camera coordinate space and IMU system coordinate space, with the following steps:

(1) Read gravity sensor readings from the mobile device during static time.

(2) Align z-axis by aligning the gravity direction captured by the mobile device and the IMU system, e.g., M-Tracer™.

(3) Estimate translation by detecting the golf ball in a 2D video frame and estimating the size of the golf ball.

(4) Align x-/y- axes by either (a) maximizing the similarity between a re-projected IMU system captured golf club line and a detected golf club line from multiple 2D video frames or (b) minimizing the error between the re-projected IMU system captured IMU system sensor location and the detected IMU system sensor location from multiple 2D video frames. The location of the IMU system can be detected by attaching a marker on the sensor surface or detecting an LED (light emitting diode) or other flashing or non-flashing light positioned on the sensor, through analysis of the video frames.

(5) User assisted mode, in some embodiments, can be activated to improve cases that have inaccurate output, with minimal user assistance.

Also disclosed herein is a marker based method to spatially align the IMU system captured golf swing trajectory with a 2D video captured by a static regular camera that does not necessarily come with an embedded IMU sensor. The method has the following steps:

(1) Put a marker on the floor.

(2) Perform marker detection and calculate rotation and translation from marker coordinate space to camera coordinate space.

(3) Estimate translation from IMU system coordinate space to marker coordinate space by measuring the distance between the golf ball, or other object, and a marker origin. In some embodiments, as an alternative, the translation estimation from IMU system coordinate space to camera coordinate space can be done directly by detecting the golf ball in a 2D video frame and estimating the size of the golf ball.

(4) Align x-/y- axes between marker space and IMU system space by either (a) maximizing the similarity between a re-projected IMU system captured golf club line and a detected club line from multiple 2D video frames or (b) minimizing the error between the re-projected, IMU system captured, IMU system location and the detected IMU system location from multiple 2D video frames. The location of the IMU system can be detected in video frames by detecting a further marker attached on the sensor surface or detecting the LED flashing light on the sensor.

(5) User assisted mode, in some embodiments, can be activated to improve cases that have inaccurate output, with minimal user assistance. The methods are detailed in the following sections.

In US patent publication 2015/0072797, a system to overlay an IMU sensor captured golf swing trace onto a 2D video is provided. The method spatially aligns these two systems by requiring the golfer to stand at a predefined position. The system renders a human model on a preview screen and the golfer adjusts his or her position such that his or her posture aligns with the model. The human model is calculated based on information regarding the camera direction, such as front, back, etc., under the assumption that the golf player and camera are either parallel or perpendicular. The distance between the camera and golf player is calculated based on the information of the golf player's height and some pre-set parameters estimated from the Japanese population. It should be appreciated that assumption of camera position will not always be guaranteed and the pre-learned parameter may not be accurate for each case. If there exists a tilt in the camera or the distance between camera and player is inaccurate, the overlay of sensor captured swing trajectory will deviate from the actual swing motion from the video. In such a case, a manual adjustment has to be implemented through a user interface.

Present embodiments, however, provide a fully automatic solution to align the IMU system signal with video without requiring any additional human interaction, except for the optional user assisted mode in some embodiments. The golfer can stand at any position without pre-adjusting positons and the system does not require the camera to be positioned in parallel or perpendicular to the golf player. The method calibrates IMU system coordinate space and camera coordinate space by using IMU information from the smart phone and detecting image features, such as using a detected golf club line to align x-y plane rotation and using a determined golf ball size to determine distance between camera and golf player.

The embodiments align the IMU system trajectory with a 2D golf swing video by estimating the transformation from the IMU system coordinate system to the camera coordinate system, including three rotation angles (pitch θ, roll φ and yaw ψ) and three translation values in x, y and z axes. FIG. 2 is a flow diagram of a method for overlaying an IMU captured golf swing trajectory onto a two-dimensional video.

The system takes the following as input: (1) IMU Reading: gravity sensor reading when the mobile device is static, (2) IMU system trajectory, (3) RGB (red, green, blue, i.e. color, or in further embodiments black and white) video from a mobile device, (4) camera intrinsics, and (5) temporal alignment information represented as time bias between the first frame of IMU system reading and the first RGB video frame. The temporal alignment is assumed to be solved by some method. A method of doing temporal alignment is provided in the present disclosure with reference to FIGS. 1 and 16-20, after the method of spatial alignment. Aspects of the methods disclosed herein can be performed by various modules as described below, in various embodiments. The modules can be implemented in software executing on a processor, hardware, firmware or combinations thereof, in a computing device or mobile device. In one embodiment, a pipeline includes these modules:

(1) Z-axis alignment 202 to solve pitch and roll angle. This module or process receives an IMU reading 212, for example from an IMU associated with a camera, and the captured M-Tracer™ trajectory 214, from the IMU attached to the golf club, and outputs z axis alignment information to the overlay 210 module or process.

(2) Translation estimation 204 by detecting the golf ball and estimating its size. This module or process receives the M-Tracer™ trajectory 214 of the golf club swing, RGB video 212 (e.g., the two-dimensional video of the golf club swing), camera intrinsics 214, and time bias 216, and outputs the translation matrix to the x y axis alignment 206 module or process and the overlay 210 module or process. Included in the translation estimation 204 are golf ball detection 220 and golf ball size estimation 222.

(3) X-Y- axis alignment 206 by detecting and tracking golf clubs or detecting and tracking M-Tracer™ sensor to solve yaw angle. This module or process receives the translation matrix from the translation estimation 204 module or process, and outputs x y axis alignment information to the overlay 210 module or process. Included in the x y axis alignment 206 is golf club/M-Tracer™ detection and tracking 224.

(4) User assisted mode 208. This mode can be activated to improve the cases which have inaccurate output. This receives overlay information from the overlay 210 module or process, and user input 218, and outputs information to the translation estimation 204 module or process useful in golf ball detection 220.

(5) Overlay 210, to overlay the projected M-Tracer™ trajectory on the video. This module or process receives z axis alignment information 202 from z axis alignment 202, the translation matrix from translation estimation 204, x y axis alignment information from x y axis alignment 206, the M-Tracer™ trajectory 214, the RGB video 212, the camera intrinsics 214, and the time bias 216. Overlay 210, in some embodiments, interacts with user assisted mode 208. Output of overlay 210 is the overlaid video 226, which is also used by translation estimation 204 for refinement of the translation matrix.

Notations and Assumptions used herein are described below. Let [x_(T), y_(T), z^(T)]^(T) denote a point represented in M-Tracer™, or other suitable IMU system, coordinate space and [x_(c), y_(c), z_(c)]^(T) denote the same point represented in camera space, thus

[x _(c) , y _(c) , z _(c)]^(T) =R _(MC) [x _(T) , y _(T) , z _(T) ] ^(T)+T_(MC,)   (Eq. 1)

where R_(MC) and T_(MC) denote the rotation and translation from IMU system coordinate system to camera coordinate system.

Any 3D rotation can be represented by three Euler angles: pitch (θ), roll (φ) and yaw (ψ) which are defined as the rotation angle around x, y and z axis accordingly. Therefore, R_(MC) can be further decomposed as

R _(MC) =R(θ, φ)R(ψ)   (Eq. 2)

In the following sections, the method to estimateR(θ, φ),R(ψ) and T_(MC) is detailed. After obtaining R_(MC) and T_(MC), IMU system trajectory can be transformed to camera coordinate system and its projection onto the 2D image plane [u, v] can be calculated accordingly using camera intrinsic parameters, i.e.,

$\begin{matrix} {u = {\frac{f_{x}x_{c}}{z_{c}} + c_{x}}} & \left( {{Eq}.\mspace{14mu} 3} \right) \\ {v = {\frac{f_{y}y_{c}}{z_{c}} + c_{y}}} & \left( {{Eq}.\mspace{14mu} 4} \right) \end{matrix}$

where f_(x), f_(y) denote the focal length measured in width and height of pixels and [c_(x), c_(y)] denote the principal point coordinate.

There are some assumptions made by the proposed algorithm as follows:

(1) Camera is equipped with an IMU sensor that can measure the gravity direction relative to camera IMU coordinates.

(2) Camera is mounted to a fixed position and kept static during the golf club swing.

(3) During address time (i.e., the moment of golf club impact to the golf ball), golf club position is close to the golf ball.

Details of the Coordinate Systems are provided below. In order for the system to determine the transformation from the IMU system coordinate system to the camera coordinate system, three body coordinates are involved: IMU system CS (coordinate system), cell phone IMU CS and cell phone camera CS. These body coordinates are defined further below, in some embodiments. Note that in various embodiments, the coordinate system is defined as follows, but the algorithm does not require these specific coordinate systems, and conversions to other coordinate systems are readily devised so that different coordinate systems may be integrated with the embodiments.

FIG. 3A depicts a coordinate system for an IMU, relative to gravity and a ball running direction. The origin (0, 0, 0) of the golf club attached IMU coordinate system is at the golf club head position when the golf club head hits the ball, and is assumed the same as the ball position. The z axis 302 is defined as aligned with gravity, and the positive values along the z axis extending upward. The x axis 304 is defined as aligning with the ball running direction, with the positive values along the x axis extending forward in the direction the ball travels initially after the address (i.e. moment of impact). The y axis 306 is defined as orthogonal to the x and z axes.

FIG. 3B is a front view of a cell phone camera coordinate system. The origin of the cell phone camera coordinate system could be located anywhere in the frame of the video, for example at a corner of the screen or frame, in the middle of the screen or frame, or in an upper region of the frame as depicted. The y axis 308 is pointing downward relative to the video frame. The x axis 310 is pointing rightward relative to the video frame. The z axis 312 is pointing upward perpendicular to the video frame, orthogonal to the x and y axes.

FIG. 3C is a front view of a cell phone IMU coordinate system. The origin of the cell phone IMU coordinate system could be at the location of the cell phone IMU, or elsewhere relative to the cell phone. The y axis 314 is pointing upward, parallel to the main body of the cell phone. The x axis 316 is pointing rightward relative to the main body of the cell phone. The z axis 318 is perpendicular to the main body of the cell phone and is pointing upward relative to the screen or face of the cell phone.

FIG. 4 is a flow diagram of a method to align z axis of IMU coordinate space and camera coordinate space. In an action 402, gravity sensor values are read from a mobile device. In an action 404, a gravity-up vector in camera space is calculated according to the equation 5. In an action 406, a gravity-up vector is read from the IMU system. In an action 408, pitch and roll angles are calculated according to equations 8 and 9. This method can be performed by

Module 1: Z Axis Alignment (R(θ, φ)).

Let [g_(xi), g_(yi), g_(zi)]^(T) denote the unit gravity_up vector represented in IMU coordinate system that can be obtained by normalizing the reading from gravity sensor, where gravity_up denotes the gravity direction pointing upward. Let [g_(xc), g_(yc), g_(zc)]^(T) denote the corresponding representation in camera coordinate system. According to FIGS. 4 and 5, it is observed that

[g _(xc) , g _(yc) , g _(zc)]^(T) =R _(i2c) [g _(xi) , g _(yi) , g _(zi)]^(T)   (Eq. 5)

where R_(i2c) denotes the rotation from IMU coordinate space to camera coordinate space, which can be obtained through a IMU-Camera calibration process such as the method of Lobo and Dias published in 2007. In most smart phones and tablets, the IMU chip is installed such that the three axes are aligned with camera axes as illustrated in FIGS. 3B and 3C, thus R_(i2c) can be reasonably simplified as

$R_{i\; 2c} = {\begin{bmatrix} 1 & 0 & 0 \\ 0 & {- 1} & 0 \\ 0 & 0 & {- 1} \end{bmatrix}.}$

Let [g_(xT), g_(yT), g_(zT)]^(T) denote the unit gravity_up vector represented in the M-Tracer™ coordinate system. Aligning [g_(xc), g_(yc), g_(zc)]^(T) with [g_(xT), g_(yT), g_(zT)]^(T) is equivalent to find the R(θ, φ), such that

[g _(xc) , g _(yc) , g _(zc)]^(T) =R(θ, φ) [g _(xT) , g _(yT) , g _(zT)]^(T)   (Eq. 6)

As depicted in FIG. 3A, in the current work, gravity_up aligns with z-axis in M-Tracer™ coordinate system, i.e., [g_(xT), g_(yT), g_(zT)]^(T)=[0,0,1]^(T). R (θ, φ) can be expressed as

$\begin{matrix} {{R\left( {\theta,\varphi} \right)} = {{\begin{bmatrix} 1 & 0 & 0 \\ 0 & {\cos \; \varphi} & {\sin \; \varphi} \\ 0 & {{- \sin}\; \varphi} & {\cos \; \varphi} \end{bmatrix}\begin{bmatrix} {\cos \; \theta} & 0 & {{- \sin}\; \theta} \\ 0 & 1 & 0 \\ {\sin \; \theta} & 0 & {\cos \; \theta} \end{bmatrix}}\begin{bmatrix} {\cos \; \psi} & {\sin \; \psi} & 0 \\ {{- \sin}\; \psi} & {\cos \; \psi} & 0 \\ 0 & 0 & 1 \end{bmatrix}}} & \left( {{Eq}.\mspace{14mu} 7} \right) \end{matrix}$

where R(θ, φ)₁₃=−sin θ, R(θ, φ)₂₃=sin φ cos θ, R(θ, φ)₃₃=cos φcos θ. Thus roll and pitch angle can be solved as

φ=α tan 2(g _(yc) , g _(zc))   (Eq. 8)

θ=a tan(−g _(xc),(g _(yc)*sin φ+g _(zc)*cos φ))   (Eq. 9)

Thus by setting ψ as an arbitrary value to be optimized later, e.g., ψ=0, R(θ, φ) can be calculated accordingly.

FIG. 5 is a flow diagram of a method to estimate translation between IMU coordinate space and camera coordinate space. In an action 502, the ball (i.e., golf ball) is detected from the two-dimensional video frame. In an action 504, the ball size is detected from the two-dimensional video frame. In an action 506, the translation is estimated according to equation 11. This results in a transformation matrix.

This method can be performed by Module 2: Calculating Translation. As shown in FIG. 3A, the origin of the IMU system coordinate system is the location of the golf ball (and same as the club head position when impact happens). Therefore, its translation to the camera center is equivalent to the 3D location of the ball ([X_(ball), Y_(ball), Z_(ball)]) in the camera coordinate system.

T _(MC) =[X _(ball) , Y _(ball) , Z _(ball)]^(T)   (Eq. 10)

Let u_(ball), v_(ball) and rad_(ball) be the location and radius of the detected golf ball in a 2D image. Embodiments for the detection of ball location and ball size can be found later in the disclosure, with reference to FIGS. 8-15. X_(ball), Y_(ball), Z_(ball) are found by the following formula assuming a pin-hole camera model

$\begin{matrix} {{X_{ball} = {\frac{\left( {u_{ball} - c_{x}} \right)}{f_{x}}Z_{ball}}},{Y_{ball} = {\frac{\left( {v_{ball} - c_{y}} \right)}{f_{y}}Z_{ball}}},{Z_{ball} = {\frac{f}{{rad}_{ball}}R_{ball}}},} & \left( {{Eq}.\mspace{14mu} 11} \right) \end{matrix}$

where f_(x), and f_(y) denote focal length, c_(x), and c_(y) the principle point of the camera that obtained from camera intrinsics and R_(ball) is the priori known actual golf ball size in 3D.

It should be noted, if in another system, the origin of the M-Tracer™ (or other IMU system) coordinate system is not the golf ball position, it is easy to convert to the presently defined coordinate system by subtracting the whole IMU system trajectory by the value of the known position of club head position at impact time. It should also be noted that other objects around the ball (which have the same distance from camera) with known size could be used to estimate Z_(ball.) For instance:

Golf club: since the size of the golf club is know it could be used for distance estimation. For side view this estimation has no problem, but, for other angles (e.g. frontal view) the golf club should be positioned straight as shown in FIG. 6.

$Z_{ball} = {\frac{f}{{Length}_{{club}\mspace{11mu} i\; n\mspace{11mu} 2D\mspace{11mu} {image}}}{Length}_{{club}\mspace{11mu} i\; n\mspace{11mu} {real}\mspace{11mu} {world}}}$

Human height: the height of the person who swings is entered into the application and the person stands straight.

$Z_{ball} = {\frac{f}{{Height}_{{humen}\mspace{11mu} i\; n\mspace{11mu} 2D\mspace{11mu} {image}}}{Height}_{{human}\mspace{11mu} i\; n\mspace{11mu} {real}\mspace{11mu} {world}}}$

FIG. 6 depicts positioning a golf club 602 straight up and down for a distance estimation from a frontal camera view. Also note that any of the distance estimation methods could give an approximation for the size of other objects with known size. For example, Zball could be found by the ball size and then used to estimate the height of the person or straight club in the image.

FIG. 7 is a flow diagram of a method to align x and y axes of IMU coordinate space and camera coordinate space. Aligning the XY plane can be done by finding the yaw angle. This method can be performed by Module 3: X-Y-Axis alignment (R(ψ)).

To estimate yaw angle, two solutions are proposed. One solution is to maximize the projection similarity of the IMU system golf club line and the detected club line in frames of the video. The other solution is to minimize the error between the re-projected IMU system location and the detected IMU system location in 2D video. To further refine a result, an iterative process is applied to remove the outliers. In the following embodiment, the alignment is performed using the golf club line.

In an action 702, a golf club line is detected in multiple frames. In an action 704, the yaw angle is estimated according to equation 12. In an action 706, outliers are detected, and an outlier set is formed. In a determination action 708, it is determined whether an outlier set is empty. If the answer is yes, flow proceeds to the action 712, and the yaw angle is output. If the answer is no, flow proceeds to the action 710, to remove the outlier, and from there to the action 704 in order to iterate the estimation of the yaw angle.

If the outlier set is not empty in the determination action 708, flow alternatively could proceed to the action 714 to remove the outlier, and from there to action 718. Action 718 is also preceded by action 716, in which the golf club line is detected in multiple frames. In the action 718, the yaw angle is estimated according to equation 16 as an alternative to the action 704 using equation 12. Action proceeds from the action 718 to the action 706, to detect outliers.

The yaw angle is estimated by maximizing the projection similarity of the M-Tracer™ golf club line and the detected club in the frames. Assume that the detected line at frame i is l^(i)=[l_(x1) ^(i), l_(y1) ^(i), l_(x2) ^(i), l_(y2) ^(i)] and the corresponding M-Tracer™ golf club line (i.e., the line connecting M-Tracer™ position to the club head position) is denoted by mtl^(i)=[l_(x1) ^(i), l_(y1) ^(i), l_(z1) ^(i), l_(x2) ^(i), l_(y2) ^(i), l_(z2) ^(i)].

The system solves the following maximization equation to find the yaw angle:

maxarg_(ψ)Σ_(i∉Outliers)sim(π(K, T, mtl^(i)), l^(i))   (Eq.12)

where π is the image projection function, K is the camera intrinsic matrix, T is the final transformation matrix from the IMU system coordinate system to the camera system that is defined as follows:

${T = {{\begin{bmatrix} {{R\left( {\theta,\varphi} \right)}{R(\psi)}} & T_{MC} \\ 0 & 1 \end{bmatrix}\mspace{14mu} {and}\mspace{14mu} {R(\psi)}} = \begin{bmatrix} {\cos \; \psi} & {\sin \; \psi} & 0 \\ {{- \sin}\; \psi} & {\cos \; \psi} & 0 \\ 0 & 0 & 1 \end{bmatrix}}},$

and sim the similarity of the projected line and detected line at frame i. Explanation of each part is given below.

The image projection function, π(K, T, l), projects a 3D line segment l=[l_(x1), l_(y1), l_(z1), l_(x2), l_(y2), l_(z2)] determined by two points [l_(x1), l_(y1), l_(z1)]^(T), [l_(x2), l_(y2), l_(z2)]^(T) into a 2D line segment l^(proj)=[l_(x1) ^(proj), l_(y1) ^(proj), l_(x2) ^(proj), l_(y2) ^(proj)]:

$\begin{matrix} {{\pi \left( {K,T,\left\lbrack {l_{x\; 1},l_{y\; 1},l_{z\; 1},l_{x\; 2},l_{y\; 2},l_{z\; 2}} \right\rbrack} \right)} \text{:}\mspace{14mu} \left\{ {\begin{matrix} {{l_{x\; 1}^{proj} = \frac{\left( {{KT}\left\lbrack {l_{x\; 1},l_{y\; 1},l_{z\; 1},1} \right\rbrack}^{T} \right)_{x}}{\left( {{KT}\left\lbrack {l_{x\; 1},l_{y\; 1},l_{z\; 1},1} \right\rbrack}^{T} \right)_{z}}},} \\ {{l_{y\; 1}^{proj} = \frac{\left( {{KT}\left\lbrack {l_{x\; 1},l_{y\; 1},l_{z\; 1},1} \right\rbrack}^{T} \right)y}{\left( {{KT}\left\lbrack {l_{x\; 1},l_{y\; 1},l_{z\; 1},1} \right\rbrack}^{T} \right)_{z}}},} \\ {{l_{x\; 2}^{proj} = \frac{\left( {{KT}\left\lbrack {l_{x\; 2},l_{y\; 2},l_{z\; 2},1} \right\rbrack}^{T} \right)_{x}}{\left( {{KT}\left\lbrack {l_{x\; 2},l_{y\; 2},l_{z\; 2},1} \right\rbrack}^{T} \right)_{z}}},} \\ {l_{y\; 2}^{proj} = \frac{\left( {{KT}\left\lbrack {l_{x\; 2},l_{y\; 2},l_{z\; 2},1} \right\rbrack}^{T} \right)_{y}}{\left( {{KT}\left\lbrack {l_{x\; 2},l_{y\; 2},l_{z\; 2},1} \right\rbrack}^{T} \right)_{z}}} \end{matrix}.} \right.} & \left( {{Eq}.\mspace{14mu} 13} \right) \end{matrix}$

and sim as:

$\begin{matrix} {{{{sim}\left( {\left\lbrack {l_{x\; 1}^{1},l_{y\; 1}^{1},l_{x\; 2}^{1},l_{y\; 2}^{1}} \right\rbrack,\left\lbrack {l_{x\; 1}^{2},l_{y\; 1}^{2},l_{x\; 2}^{2},l_{y\; 2}^{2}} \right\rbrack} \right)} = \frac{\left( {\left\lbrack {{l_{x\; 1}^{1} - l_{x\; 2}^{1}},{l_{y\; 1}^{1} - l_{y\; 2}^{1}}} \right\rbrack \cdot \left\lbrack {{l_{x\; 1}^{2} - l_{x\; 2}^{2}},{l_{y\; 1}^{2} - l_{y\; 2}^{2}}} \right\rbrack} \right)^{2}}{\left( {\left( {l_{x\; 1}^{1} - l_{x\; 2}^{1}} \right)^{2} + \left( {l_{y\; 1}^{1} - l_{y\; 2}^{1}} \right)^{2}} \right)\left( {\left( {l_{x\; 1}^{2} - l_{x\; 2}^{2}} \right)^{2} + \left( {l_{y\; 1}^{2} - l_{y\; 2}^{2}} \right)^{2}} \right)}},} & \left( {{Eq}.\mspace{14mu} 14} \right) \end{matrix}$

where · is the dot product.

An alternate method is to perform alignment with IMU system location. The yaw angle can also be estimated by minimizing the re-projection error of the IMU system location. To detect the IMU system device in the 2D video frames, a color or texture marker can be attached to the device. An alternative technique is to detect the LED light source on the device.

Let [X_(MT) ^(i), Y_(MT) ^(i), Z_(MT) ^(i)]^(T) be 3D location of IMU system at time i represented in IMU system coordinate space and its projection to 2D image space, denoted as [

,

] can be represented as

[x_(MT) ^(i), y_(MT) ^(i), z_(MT) ^(i)]^(T)=K(R(θ, φ)R(ψ)[X_(MT) ^(i), Y_(MT) ^(i), Z_(MT) ^(i)]^(T)+T_(MC))   (Eq. 15)

=x_(MT) ^(i)/z_(MT) ^(i)

=y_(MT) ^(i)/z_(MT) ^(i)

Let [u_(MT) ^(i), v_(MT) ^(i)] be the detected IMU system location in 2D video frame, the yaw angle can be estimated by minimizing the reprojection error

minarg_(ψ)Σ_(i∉Outliers)∥(u_(MT) ^(i)−

, (v_(MT) ^(i)−

)∥  (Eq. 16)

Where ∥·∥ denotes the norm 2 metric.

After initial estimation the yaw angle by either method described above, the results are further refined via an iterative outlier removal process, in some embodiments.

-   (1) Use N frames to estimate yaw angle and calculate transformation     T=[R(θ, φ)R(ψ)|T_(MC)] -   (2) Find outliers, O using estimated T -   (3) Set new outliers, O_(new)=O -   (4) Do the following until O_(new)is empty

(a) Estimate T by removing outliers (O) from computation

(b) Find outliers, O_(new), using estimated T

(c) O=O+O_(new),

To find outliers for a given transformation T (assuming that T is approximately correct), the system checks if the 2D detected lines are parallel to the 3D projected lines. This check is done by comparing the distance of each two ends of the projected 3D line segment to the corresponding 2D detected line. If the difference of the distances is higher than a threshold then the detected 2D line is an outlier.

Assume that l^(proj)=[l_(x1) ^(proj), l_(y1) ^(proj), l_(x2) ^(proj), l_(y2) ^(proj)]is the projection of a 3D line, l, using the operator π(K, T, l) and the corresponding 2D detected line is l^(2D)=[l_(x1) ^(2D), l_(y1) ^(2D), l_(x2) ^(2D), l_(y2) ^(2D)]. The outlier function ⊚(l^(2D), l^(proj)∈) returns 1 if the 2D detected line is outlier and is defined as:

$\begin{matrix} {{\odot \left( {l^{2D},l^{proj},\varepsilon} \right)} = \left\{ \begin{matrix} {0,} & {{{if}\mspace{14mu} {{{\vartheta \left( {l^{2D},\left\lbrack {l_{x\; 1}^{proj},l_{y\; 1}^{proj}} \right\rbrack} \right)} - {\vartheta \left( {l^{2D},\left\lbrack {l_{x\; 2}^{proj},l_{y\; 2}^{proj}} \right\rbrack} \right)}}}} \leq \varepsilon} \\ {1,} & {otherwise} \end{matrix} \right.} & \left( {{Eq}.\mspace{14mu} 17} \right) \end{matrix}$

where θ is the distance of a point to a line:

$\begin{matrix} {{\vartheta \left( {l^{2D},\left\lbrack {l_{x}^{proj},l_{y}^{proj}} \right\rbrack} \right)} = {\frac{{\left( {l_{x}^{proj} - l_{x\; 1}^{2D}} \right)\left( {l_{y\; 2}^{2D} - l_{y\; 1}^{2D}} \right)} - {\left( {l_{y}^{proj} - l_{y\; 1}^{2D}} \right)\left( {l_{x\; 2}^{2D} - l_{x\; 1}^{2D}} \right)}}{\sqrt{\left( {l_{x\; 1}^{2D} - l_{x\; 2}^{2D}} \right)^{2} + \left( {l_{y\; 1}^{2D} - l_{y\; 2}^{2D}} \right)^{2}}}.}} & \left( {{Eq}.\mspace{14mu} 18} \right) \end{matrix}$

FIG. 8 is a flow diagram of a method for user assisted mode. This can be performed by Module 4: User Assisted Mode. Note that the user does not need to give a precisely accurate location, but a region of interest is considered around the given location for ball and line detection algorithms (described with reference to FIGS. 11-15).

In an action 802, the video with the IMU system trajectory overlaid is showed. In a decision action 804, it is determined whether the user is satisfied. If yes, the flow ends. If no, flow proceeds to the action 806, to ask the user to click on a location of a ball (e.g., with a touchscreen or mouse or other user input). In an action 808, module 2 for ball detection is run on a window around the click location. In an action 810, module 3 for x y axis alignment is run. In an action 812, the video with the IMU system trajectory overlaid is illustrated. In a decision action 814, it is determined whether the user is satisfied. If yes, the flow ends. If no, flow proceeds to the action 816, to ask the user to click on the location of the IMU system in a new frame. In an action 818, a region of interest is defined around the click location for this frame, and made to the initial input four module 3. Flow proceeds to the action 810, to run module 3 again.

FIGS. 9A and 9B show examples of user assisted mode: clicking on a location of the golf ball 902 (FIG. 9A) in a video frame 906 and clicking on a location of the IMU 904 (FIG. 9B) in a video frame 108. This could be performed using a touchscreen or cursor controls on a mobile device displaying the video frame. Alternatively, this could be performed on another computing device with a view screen and other types of user entry devices such as keyboard, mouse, trackball, etc.

The above sections describe the method to align 2D video and IMU system signal by using a mobile device with an embedded IMU sensor. In this section, an alignment method with a regular camera that does not have an IMU sensor is described. The method performs the alignment by using a marker placed on the floor as shown in FIGS. 10 and 11.

FIG. 10 depicts a marker coordinate system. In this embodiment, the marker coordinate system has x and y axes 1008, 1010 on a plane defined by the marker 1002, which could be a sheet or a plate (e.g. of paper, cardboard, plastic, metal or cloth or other material) with dots or other pattern 1004 on a surface. The z axis 1006 points upward, perpendicular to the plane and the sheet or plate.

FIG. 11 shows measurements 1102 of a ball location in marker coordinate space. The marker of FIG. 10 is placed on the floor at some distance from the ball. The distance from the ball can be determined by x, y and z axis measurements.

A review of Eq. 1 shows the equation can be further extended as

[x _(c) , y _(c) , z _(c)]^(T) =R _(MC) [x _(T) , y _(T) , z _(T)]^(T) +T _(MC)   (Eq.19)

=R _(RC)(R _(MR) [x _(T) , y _(T) , z _(T)]^(T) +T _(MR))+T _(RC)

R_(RC) and T_(RC) are rotation and translation form marker space to camera space that can be easily obtained by any marker detection algorithm. Thus the only remaining unknowns are R_(MR) and T_(MR) which are denoted as the rotation and translation from IMU system space to marker space. FIG. 10 illustrates the marker coordinate system. According to right hand rule, the Z axis of the marker is perpendicular into the marker plane. T_(MR) is the translation from the IMU system coordinate system to the marker coordinate system, which is the 3D location of the ball represented in marker coordinate space. This 3D location can be measured manually from the ball to the marker origin. An example is illustrated in FIG. 11 and the T_(MR)=[−|X|, −|Y|, −|Z|]^(T).

Another way to determine translation is to use the method described above with reference to FIG. 5 and module 2. Eq.19 can be re-written as

[x _(c) , y _(c) , z _(c)]^(T) =R _(MC) [x _(T) , y _(T) , z _(T)]^(T) +T _(MC)

=R _(RC)(R _(MR) [x _(T) , y _(T) , z _(T)]^(T) +T _(MR))+T _(RC)   (Eq.20)

=R _(RC) R _(MR) [x _(T) , y _(T) , z ^(T)]^(T) +T _(MC)

where T_(MC) can be obtained following the method described with reference to FIG. 5 and module 2. User assisted mode could be activated to help ball detection as described above with reference to FIG. 8 and module 4.

Due to the fact that the marker is placed on the floor, it is reasonable to assume that the negative Z-axis of marker space is the gravity_up direction that aligns with z-axis of the IMU system. The marker can be placed at a variety of locations as along as the marker plane is perpendicular to gravity. Thus,

${R_{MR} = {\begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & {- 1} \end{bmatrix}\; {R(\psi)}}},$

where R(ψ) is the rotation around z-axis and ψ is the yaw angle. The estimation of R(ψ) can be done using method described with reference to FIG. 7 and module 3. User assisted mode could be activated to help ball detection as described with reference to FIG. 8 and module 4.

In order to detect a golf club in a frame, in one embodiment, first a background model is constructed. The background model is the average of the frames:

${{BG} = {\frac{1}{e - s}{\sum\limits_{i = s}^{e}I_{i}}}},$

where s and e represent the first and last frames to average. The best range [s e] is the range covering fast moving frames (e.g., from golf club top position to impact) which can be obtained from the synchronized IMU system data. For the frame I_(i), first the background is subtracted and then the image is normalized:

${I_{i} = {I_{i} - {BG}}},{{I_{i}\left( {x,y} \right)} = {\frac{{I_{i}\left( {x,y} \right)} - {\min \left( {I_{i}\left( {x,y} \right)} \right)}}{{\max \left( {I_{i}\left( {x,y} \right)} \right)} - {\min \left( {I_{i}\left( {x,y} \right)} \right)}} \cdot 255.}}$

Now the Canny edge detector and then the probabilistic Hough transform are applied to find lines. The lines are then filtered by following constraints:

Angle constraint: the angle of the line with x axis should be within the following range: [α−t α+t] or [−α−t−α+t], where a is the expected angle of the golf club with the x axis (e.g.,)60° and t the tolerance (e.g.,10°). Note, this constraint limits the detection of the golf club at some specific positions. However it helps to reduce many false positives. As only one parameter needs to be estimated, a couple of accurately detected lines will be sufficient to solve the unknown.

Color/Width constraint: the line should have the given minimum color with the minimum width. If multiple lines are detected, the longest line, l^(l)=[l_(x1) ^(l), l_(y1) ^(l), l_(x2) ^(l), l_(y2) ^(l)] (with [l_(x1) ^(l), l_(y1) ^(l)] denoting the start point and [l_(x2) ^(l), l_(y2) ^(l)] the end point of the line), is considered as the golf club. The other lines, l^(i), are compared with l^(i) and if they are consistent (i.e., approximately parallel to l^(i) and are within a specific distance), they are kept. Finally, l^(i) and the consistent lines l^(i) are averaged as follows:

$\left\lbrack {l_{x\; 1}^{avg},l_{y\; 1}^{avg}} \right\rbrack = {{\left\lbrack {{\frac{1}{N + 1}\left( {l_{x\; 1}^{l} + {\sum\limits_{i}l_{x\; 1}^{i}} + {t \cdot v_{x}^{i}}} \right)},{\frac{1}{N + 1}\left( {l_{y\; 1}^{l} + {\sum\limits_{i}l_{y\; 1}^{i}} + {t \cdot v_{y}^{i}}} \right)},} \right\rbrack \left\lbrack {l_{x\; 2}^{avg},l_{y\; 2}^{avg}} \right\rbrack} = \left\lbrack {{\frac{1}{N + 1}\left( {l_{x\; 2}^{l} + {\sum\limits_{i}l_{x\; 2}^{i}} + {t \cdot v_{x}^{i}}} \right)},{\frac{1}{N + 1}\left( {l_{y\; 2}^{l} + {\sum\limits_{i}l_{y\; 2}^{i}} + {t \cdot v_{y}^{i}}} \right)},} \right\rbrack}$

where [l_(x1) ^(avg), l_(y1) ^(avg)] denotes the start point of the average line, [l_(x2) ^(avg), l_(y2) ^(avg)], the end point of the average line, l the parametric value of the line and v^(i)=[v_(x) ^(i), ·v_(y) ^(i)] the vector representing the l^(i) direction:

$t = {{\frac{{\left( {l_{x}^{l} - l_{x}^{i}} \right) \cdot v_{x}^{i}} + {\left( {l_{y}^{l} - l_{y}^{i}} \right) \cdot v_{y}^{i}}}{{v^{i}}^{2}}\left\lbrack {v_{x}^{i},v_{y\;}^{i}} \right\rbrack} = \left\lbrack {{l_{x\; 2}^{i} - l_{x\; 1}^{i}},{l_{y\; 2}^{i} - l_{y\; 1}^{i}}} \right\rbrack}$

This average line is computed for each given frame.

For golf club tracking, and in order to improve the robustness of the algorithm, in some embodiments the start and end points of the line along with their average (i.e., the middle point) are tracked by the pyramidal implementation using the Lucas-Kanade tracking method in one embodiment. If no line is detected but the points are tracked properly, the line is estimated by fitting a line to the points (e.g., by least square method).

For ball detection in some embodiments, in the 2D image the system first detects the golf club (see above) in the address pose (i.e., relatively static frames before the swing starts). Assume that the golf club is detected as [GC_(x1), GC_(y1), GC_(x2), GC_(y2)], where [GC_(x1), GC_(y1)] denote the upper side of the club (e.g., IMU system position) and [GC_(x2), GC_(y2)] is the head of the club. The circular pattern centered at [x, y] and of radius of R is defined as:

${{Pat}\left( {x^{\prime},y^{\prime}} \right)}_{\lbrack{x,y,R}\rbrack} = \left\{ {\begin{matrix} {1,} & {\sqrt{\left( {x - x^{\prime}} \right)^{2} + \left( {y - y^{\prime}} \right)^{2}} \leq R} \\ {0,} & {otherwise} \end{matrix}.} \right.$

The system searches for the circle pattern in a window W by extending the golf club from its head (Ext):

W={[x, y]|∥[x, y]−Ext∥≦nR}, (n≧1),

Ext={[x, y]=[(GC _(x2) −v(1)·i)−R/2, (GC _(y2) v(2)·i) −R/2]}, (a≦i≦b),

where, ∥∥ is L2 norm, a and b denote the search range along the extended line (i.e., represented by a parametric equation), and v the club line direction vector:

v=[GC _(x2)−GC_(x2), GC_(y1)−GC_(y2)].

To get the initial estimation, the system finds the circular pattern in window, W, by maximizing the cross correlation of the pattern and image I:

$\underset{{\lbrack{x,y}\rbrack} \in W}{\arg \; \max}{\sum\limits_{x^{\prime},y^{\prime}}{{{Pat}\left( {x^{\prime},y^{\prime}} \right)}_{\lbrack{x,y,R}\rbrack} \cdot {{I\left( {x^{\prime},y^{\prime}} \right)}.}}}$

Once the initial center of the circle, [x, y], is found, a sub image at [x, y] with size of R is considered, the Canny edge detector and then the circle Hough transform are applied to find the center and radius of the golf ball (i.e., [x_(ball), y_(ball)] and Rad _(ball)). The golf ball may have a non-circular shape due to illumination conditions. Here, several methods are presented make ball detection and size estimation more robust.

FIGS. 12A and 12B show symmetry of intensity 1202 (FIG. 12A) and symmetry of gradient direction 1204 (FIG. 12B) in images of golf balls. The first method uses symmetry of the ball to exclude false positives. This symmetry holds for intensity and gradient direction, as shown in FIGS. 12A and 12B. If the detected object does not have a symmetric intensity and gradient direction, it is rejected. In order to find symmetry the center of the ball is needed.

FIGS. 13A and 13B depict using a distance transform to find the center of the ball. The distance transform could be used to find the center of the circle. Distance transform has been used to robustly find the maximum inscribed circles and could be more uniformly likely to find the center of the ball. FIG. 13A shows an example in which the Hough transform gives a smaller circle 1302 due to a shadow problem, but the distance transform 1304 can be used to give a better estimation for the ball center.

FIG. 14 depicts fitting a semicircle 1402 to the ball. The next method is to fit a semi-circle instead of a full circle. Shadow usually makes the circle look smaller than its correct size. By fitting a semi-circle, as shown in FIG. 14, the effect of shadows can be over, and a better estimate of the right size of the ball can be made. To fit a semi-circle, the system could consider only large gradients, which typically occurs in non-shadow regions, or use color consistency for high intensity values.

FIGS. 15A and 15B depict intensity profiles of lines through images of golf balls. In FIG. 15A, an intensity profile 1502 of a line 1504 not passing through the center of the ball is shown. In FIG. 15B, an intensity profile 1506 of a line 1508 passing through the center of the ball is shown. The next method determines the size of the ball and its center from the intensity profile of a line through a circle. To use such intensity profiles, the intensity profile of each of multiple lines is determined. Then, the line with the largest width is selected. The line with the largest width is passing through the center in this embodiment. In this embodiment, width is the diameter of the ball and the mid-point in the intensity profile is the center of the circle, i.e., the center of the ball.

Temporal alignment is described next. Since the IMU system and the camera system run on different clocks, a method is needed to align the timing of the systems together, i.e., temporal alignment. In this disclosure a method to automatically synchronize the timing of IMU system and the camera system is described. After synchronization, the IMU system captured signal (i.e., the IMU captured golf club swing trajectory) and video (i.e., the two-dimensional video of the golf club swing) are temporally aligned and can be further aligned spatially.

With reference to FIGS. 1 and 16-20, a method and apparatus to synchronize (i.e., temporally align) the IMU system captured golf swing trajectory signal with the 2D video captured by a static camera are disclosed. The method has the following steps:

(1) Detect the golf ball location from a 2D video frame.

(2) Detect an impact frame by searching backward from the last frame of the video for the first frame that the ball appears, i.e., the brightness at the ball location becomes large enough. In order to remove false detection due to environmental noise such as lighting change, background object(s) etc., the result is further verified by using a temporal pattern.

(3) Calculate the time bias by aligning the detected impact frame from the video and the impact time from the IMU system. The time bias is the time difference between the first frame of the video and the first frame of the IMU system signal.

(4) Refine time bias by minimizing spatial alignment error with the temporally shifted IMU system golf club swing signal.

A method of detecting impact time (i.e., the moment at which the golf club head impacts the golf ball) is as follows. The method compares the image intensity change of neighboring two frames (i—1^(th) and i^(th) frame) of a video of a golf club swing, at a determined ball location relative to each of the frames. If such change is larger than a threshold, the impact time is determined as the i—1^(th) frame. This is further described below with reference to FIGS. 17 and 18.

Using intensity change only, to detect impact time usually suffers from noise from environmental factors such as lighting change and background object occlusion, resulting in false detection. Different from and improving upon the method above, a further method disclosed herein uses a temporal pattern to further verify the initial detection result from image intensity so that detection accuracy is improved. In addition, the method to synchronize the IMU system signal and video by using the detected impact frame is disclosed herein.

FIG. 16 is a flow diagram of a method for determining a time bias between a video frame and an inertial measurement unit sample, for synchronizing. One goal of the present system and method embodiments is to synchronize the IMU system signal and the video signal, i.e., determine the time bias between the first video frame and first IMU system sample. Let N be the number for frames in the video. In an action 1602, the background image is estimated from the last few images. In an action 1604, ball detection is performed and the ball location is obtained, relative to coordinates of the frame. In an action 1606, a parameter i (e.g., a counting parameter) is set equal to N, the number of frames in the video. In an action 608, the background is subtracted from frame i. In a decision action 1610, it is determined whether the intensity at a pixel at the previously determined ball location but in the background subtracted frame i is greater than a threshold. If no, in the action 618, the parameter i is decremented. Flow proceeds to action 1608, for the next frame i. If yes, in the decision action 1610, flow proceeds to the decision action 1612, to determine whether a temporal pattern is satisfied. If no, the parameter i is decremented in the action 1618 and flow proceeds back to the action 1608 for the next frame i. If yes, in the decision action 1612, the initial time bias is estimated in the action 1614, using the RGB video 1620 and the IMU system trajectory with timestamp 1622. Then, the time bias is refined in the action 1616, using the IMU system trajectory with timestamp 1622 and camera intrinsics 1624.

The system takes RGB video, IMU system trajectory with time stamp information and camera intrinsics as input. The system first detects a golf ball location from 2D video frame(s). Thus proceeding backwards from the last frame, where the ball is assumed to be outside of the view, the system checks the intensity value of the detected ball center, in each frame. If such intensity value is bigger than a threshold and is further verified by a temporal pattern, the current frame will be determined as the impact time frame. The initial time bias can then be calculated with a known IMU system impact time stamp as well as the video frame rate. Due to the nature of the method, the accuracy of the time bias is at most within 1 frame. A refinement process is followed to further improve the accuracy by shifting IMU system trajectory samples by Δt around the calculated time bias and finding the optimal Δt with the smallest spatial alignment error.

Image processing methods described above for detecting a ball-shaped object can be applied to detecting the golf ball from the 2D images in the video. In some embodiments, instead of detecting the ball appearance directly on a video frame, such detection is performed by the system on background subtracted images to reduce background noise. Let V_(i) denote the i^(th) frame and i_(t) denote the background subtracted frames defined as

${I_{i} = {{V_{i} - {BG}}}},{{I_{i}\left( {x,y} \right)} = {\frac{{I_{i}\left( {x,y} \right)} - {\min \left( {I_{i}\left( {x,y} \right)} \right)}}{{\max \left( {I_{i}\left( {x,y} \right)} \right)} - {\min \left( {I_{i}\left( {x,y} \right)} \right)}} \cdot 255}},$

where 11 is the absolute value and BG is the average of the last M (e.g., M=10) frames that we are sure the ball does not appear:

${{BG} = {\frac{1}{M}{\underset{i = {N - M + 1}}{\sum\limits^{N}}V_{i}}}},$

where N is the total number of frames.

Starting from the last frame of the video of the golf club swing, the system detects detect when there is a sudden intensity change in the location of the ball, i.e., check each frame at the ball location [u, v] to determine if the corresponding value is bigger than a threshold. The sudden intensity change in the ball location does not always suggest the impact time. In some cases, there is a possibility of noise or other objects appearing at the location of the ball and if the change in intensity is greater than the threshold, it may cause a false detection. An example of such a scenario is presented in FIG. 17.

Therefore, in some embodiments, a verification step is followed after such an intensity change frame is found. The following pseudo code illustrates the process.

for i = N : 1 : −1   if (l_(i)(u, v) > T)    if VerificationUsingTemporalPattern(l_(n)(u,v),n = i − K, ..., i + K) =    true      id_impactFrame = i     exit for loop    end   end end

A threshold T can be defined as the average of maximum and minimum of the intensity value at ball location T=1/2(max(I_(i)(u, v)+min(I_(i)(u, v)), f or i=1:N

FIGS. 17A-C show an example of false impact frame 1704 detection. Here, the golf club appears in the same location as the golf ball after the swing. When the algorithm starts from the end of video, Frame 102 of FIG. 17A appears before Frame 68 of FIG. 17B and the change in intensity 1702 at the location of the ball in Frame 102 is higher than the threshold because of the golf club as illustrated in FIG. 17C. This can cause the algorithm to find the wrong impact frame in some embodiments.

FIG. 18 depicts a temporal pattern of intensity 1802 at the ball location, in video frames. In the verification step, the temporal pattern is used based on the observation that the intensity value before impact frames should be high (when ball exists) and the intensity value after impact should be low (ball disappears). FIG. 18 illustrates a binary temporal pattern template with a window size of A+B frames, i.e.,P_(tmp)=[1,1 , . . 1,0 , . . . , 0]. Once an impact frame candidate is found, the system checks the corresponding intensity value for continues A frames before and B frames after and calculates its temporal pattern P_(i) defined as

P_(i) = [b_(n), n = i − A:i + B] $b_{n} = \left\{ \begin{matrix} 1 & {{I_{n}\left( {u,v} \right)} > T} \\ 0 & {{I_{n}\left( {u,v} \right)} \leq T} \end{matrix} \right.$

Thus the impact frame candidate will be determined as a true impact frame if (P_(i), P_(tmp))<T_(p), where D (·) is a distance metric and T_(p) is a pre-defined threshold. In one example D (·) can be defined as Hamming distance.

The system estimates initial time bias as follows. Let f_(v) and f_(M)denote the frame rate of video and IMU system signal, let i_(v) and i_(M) denote the impact frame detected from video and impact sample detected from the IMU system. Thus the time bias t_(diff)=i_(v)f_(v)−i_(m)f_(m).

However, in various embodiments of the system, the time bias should be refined. Due to the finite frame rate of the video, the highest accuracy of temporal synchronization based on the above approach is within ±1 video frames. In some embodiments, the frame rate (i.e., the sampling rate for IMU data) of the IMU system is much higher than camera (e.g, video f_(v)=30 frames per second, IMU f_(M)=100 frames per second). Thus, a refinement step is performed to further improve accuracy of time bias up to ±1 IMU system frame.

With t_(diff) as initial temporal alignment, spatial alignment is performed as described with reference to FIGS. 1-15. Then, the IMU system signal is shifted Δt (i.e., one or multiple IMU system sample intervals) forward and backward as shown in FIG. 19.

FIG. 19 depicts temporal alignment by changing the time offset of the inertial measurement unit samples or frames relative to video frames. Three alignments are shown, with the IMU system samples 1902, 1904, 1906 aligned to the video frame 1908 at moment of ball impact with offset 0 1902 (initial time bias), offset+1 1904, and offset-1 1906.

For each adjustment Δt, corresponding spatial alignment error is calculated and the temporal alignment with the smallest spatial alignment error is selected.

T _(diff) =t _(diff) +Δt _(m)

Δt _(m)=argmin_(Δt) _(i) Err (v, MT(t−t_(diff)−Δt_(i)))

where Err(·) denote the spatial alignment error measurement, v and MT denote the trajectories obtained from video and IMU system.

FIG. 20 is a block diagram of a golf coaching system in accordance with the present disclosure. An inertial measurement unit 2001, such as the M-Tracer™ in some embodiments, is attached to a golf club 2003, and is used for capturing inertial measurement of a golf club swing by a golfer. A camera 2005 of a mobile device 2007 captures a two-dimensional video of the golf club swing. The mobile device 2007 is equipped with an inertial measurement unit 2009 and a display 2011.

A computing device 2013 as a processor 2017, memory 2019, and a wireless module 2015. The computing device receives captured inertial measurement of the golf club swing from the inertial measurement unit 2001 attached to the golf club 2003, via the wireless module 2015 of the computing device 2013. Also, the computing device receives the captured video from the camera 2005 of the mobile device 2007, via the wireless module 2015 of the computing device 2013.

The computing device has a z axis alignment module 2021, a translation module 2023, an x y axis alignment module, a user assisted mode module 2027, a temporal alignment module 2029, and an overlay module 2031, each of which could be implemented in software executing on the processor 2017, hardware, firmware, or combination thereof. These modules implement functions described above with reference to FIGS. 1-19. The computing device 2013 performs spatial and temporal alignment of the golf club inertial measurement data and the two-dimensional video for the golf club swing, and overlays a projected golf club swing onto the video. Then, the computing device 2013 sends the overlaid video, via the wireless module 2015 of the computing device 2013, to the mobile device 2007, which shows the overlaid video on the display 2011 of the mobile device 2007.

In a further embodiment, the computing device 2013 is integrated with the mobile device 2017, for example when the mobile device 2017 is a tablet that has a camera 2005, and inertial measurement unit 2009 and a display 2011. Many tablets, as is known, also have one or more processors and memory, a wireless module, user input devices, etc. In a still further embodiment, the camera 2005 is separate, and a display 2011 is coupled to or integrated with the computing device 2013. Video could be delivered from the camera 2005 to the computing device 2013 via a wired connection, or removal of memory from the camera 2005 and insertion of the memory into the computing device 2013, e.g., using a memory stick. In a yet further embodiment, the computing device 2013 is combined with the inertial measurement unit 2001 attached to the golf club 2003, so that the system needs only a mobile device 2007 with a camera 2005, or a separate camera 2005, and the inertial measurement unit with combined computing device 2013.

FIG. 21 is a flow diagram of a method for temporal alignment of an inertial measurement unit captured golf club swing and a video of the golf club swing. The method can be practiced with an inertial measurement unit attached to a golf club, a camera such as on a mobile device, and a computing device, which could be separate from or integrated with one or both of the other devices. In an action 2101, a golf club swing is captured using an inertial measurement unit attached to a golf club. The captured golf club swing inertial measurement unit data is communicated to the computing device, for example by a wireless or wired connection. In an action 2103, a video of the golf club swing is captured, using a camera. This could be a camera on a mobile device such as a tablet or a smartphone, or a dedicated camera such as a video camera. The captured video is communicated to the computing device, for example by a wireless or wired connection.

In an action 2105, the computing device detects an impact frame in the video. In an action 2107, a time bias is determined for the inertial measurement unit captured golf club swing and the captured video of the golf club swing. In an action 2109, the time bias is refined, by minimizing a spatial alignment error between the inertial measurement unit captured golf club swing and the captured video of the golf club swing. In an action 2111, a projected golf club swing trajectory is overlaid onto the video, using the time bias for temporal alignment. In an action 2113, the overlaid video is displayed, for example on a display connected to the computing device, or on a display of the mobile device. The computing device could communicate the overlaid video to the mobile device via a wireless or wired connection.

It should be appreciated that the methods described herein may be performed with a digital processing system, such as a conventional, general-purpose computer system. Special purpose computers, which are designed or programmed to perform only one function may be used in the alternative. FIG. 22 is an illustration showing an exemplary computing device which may implement the embodiments described herein. The computing device of FIG. 22 may be used to perform embodiments of the functionality for temporal and spatial alignment in accordance with some embodiments. The computing device includes a central processing unit (CPU) 2201, which is coupled through a bus 2205 to a memory 2203, and mass storage device 2207. Mass storage device 2207 represents a persistent data storage device such as a floppy disc drive or a fixed disc drive, which may be local or remote in some embodiments. The mass storage device 2207 could implement a backup storage, in some embodiments. Memory 2203 may include read only memory, random access memory, etc. Applications resident on the computing device may be stored on or accessed via a computer readable medium such as memory 2203 or mass storage device 2207 in some embodiments. Applications may also be in the form of modulated electronic signals modulated accessed via a network modem or other network interface of the computing device. It should be appreciated that CPU 2201 may be embodied in a general-purpose processor, a special purpose processor, or a specially programmed logic device in some embodiments.

Display 2211 is in communication with CPU 2201, memory 2203, and mass storage device 2207, through bus 2205. Display 2211 is configured to display any visualization tools or reports associated with the system described herein. Input/output device 2209 is coupled to bus 2205 in order to communicate information in command selections to CPU 2201. It should be appreciated that data to and from external devices may be communicated through the input/output device 2209. CPU 2201 can be defined to execute the functionality described herein to enable the functionality described with reference to FIGS. 1-21. The code embodying this functionality may be stored within memory 2203 or mass storage device 2207 for execution by a processor such as CPU 2201 in some embodiments. The operating system on the computing device may be MS DOS™, MS-WINDOWS™, OS/2™, UNIX™, LINUX™, or other known operating systems. It should be appreciated that the embodiments described herein may also be integrated with a virtualized computing system that is implemented with physical computing resources.

Detailed illustrative embodiments are disclosed herein. However, specific functional details disclosed herein are merely representative for purposes of describing embodiments. Embodiments may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.

It should be understood that although the terms first, second, etc. may be used herein to describe various steps or calculations, these steps or calculations should not be limited by these terms. These terms are only used to distinguish one step or calculation from another. For example, a first calculation could be termed a second calculation, and, similarly, a second step could be termed a first step, without departing from the scope of this disclosure. As used herein, the term “and/or” and the “/” symbol includes any and all combinations of one or more of the associated listed items.

As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising”, “includes”, and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Therefore, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.

It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

With the above embodiments in mind, it should be understood that the embodiments might employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing. Any of the operations described herein that form part of the embodiments are useful machine operations. The embodiments also relate to a device or an apparatus for performing these operations. The apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.

A module, an application, a layer, an agent or other method-operable entity could be implemented as hardware, firmware, or a processor executing software, or combinations thereof. It should be appreciated that, where a software-based embodiment is disclosed herein, the software can be embodied in a physical machine such as a controller. For example, a controller could include a first module and a second module. A controller could be configured to perform various actions, e.g., of a method, an application, a layer or an agent.

The embodiments can also be embodied as computer readable code on a tangible non-transitory computer readable medium. The computer readable medium is any data storage device that can store data, which can be thereafter read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion. Embodiments described herein may be practiced with various computer system configurations including hand-held devices, tablets, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like. The embodiments can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a wire-based or wireless network.

Although the method operations were described in a specific order, it should be understood that other operations may be performed in between described operations, described operations may be adjusted so that they occur at slightly different times or the described operations may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing.

In various embodiments, one or more portions of the methods and mechanisms described herein may form part of a cloud-computing environment. In such embodiments, resources may be provided over the Internet as services according to one or more various models. Such models may include Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). In IaaS, computer infrastructure is delivered as a service. In such a case, the computing equipment is generally owned and operated by the service provider. In the PaaS model, software tools and underlying equipment used by developers to develop software solutions may be provided as a service and hosted by the service provider. SaaS typically includes a service provider licensing software as a service on demand. The service provider may host the software, or may deploy the software to a customer for a given period of time. Numerous combinations of the above models are possible and are contemplated.

Various units, circuits, or other components may be described or claimed as “configured to” perform a task or tasks. In such contexts, the phrase “configured to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs the task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. 112, sixth paragraph, for that unit/circuit/component. Additionally, “configured to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configured to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks.

The foregoing description, for the purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the embodiments and its practical applications, to thereby enable others skilled in the art to best utilize the embodiments and various modifications as may be suited to the particular use contemplated. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. 

What is claimed is:
 1. A method for temporal alignment of golf club inertial measurement data and two-dimensional video for golf club swing analysis, comprising: capturing inertial measurement data of a golf club swing on an inertial measurement unit (IMU); and sending the inertial measurement data of the golf club swing from the inertial measurement unit to a computing device, so that the computing device calculates a time bias between the inertial measurement data of the golf club swing and a two-dimensional video of the golf club swing, based on the computing device detecting an impact frame in the two-dimensional video and the computing device aligning the detected impact frame to an impact time from the captured inertial measurement data of the golf club swing.
 2. The method of claim 1, wherein the computing device detecting the impact frame is based on the computing device detecting a location of a golf ball in a frame in the two-dimensional video.
 3. The method of claim 1, wherein the time bias includes a time difference between a first frame of the two-dimensional video and a first frame of the inertial measurement data of the golf club swing.
 4. The method of claim 1, wherein the computing device overlays a projected golf club swing trajectory, based on the inertial measurement data of the golf club swing, onto the two-dimensional video of the golf club swing, aligned using the time bias.
 5. The method of claim 1, wherein: the sending the inertial measurement data of the golf club swing to the computing device is through a wireless connection; and the computing device receives the two-dimensional video of the golf club swing from a mobile device through a further wireless connection.
 6. A method for temporal alignment of golf club inertial measurement data and two-dimensional video for golf club swing analysis, performed by a computing device, comprising: receiving captured inertial measurement data of a golf club swing from an inertial measurement unit (IMU) attached to a golf club; receiving, from a device having a video camera, a two-dimensional video of the golf club swing; detecting an impact frame showing an object impacted by the golf club in the two-dimensional video; and calculating a time bias based on aligning the detected impact frame from the two-dimensional video and an impact time from the captured inertial measurement data of the golf club swing.
 7. The method of claim 6, further comprising: refining the time bias, by minimizing spatial alignment error of the captured inertial measurement data of the golf club swing and the two-dimensional video of the golf club swing.
 8. The method of claim 6, further comprising: detecting a location of the object in a frame in the two-dimensional video; and searching backwards, relative to a time span of the two-dimensional video, through frames of the two-dimensional video for a frame in which the object appears at the detected location, to detect the impact frame.
 9. The method of claim 6, further comprising: determining a background image from a plurality of frames of the two-dimensional video; subtracting the background image from a further plurality of frames of the two-dimensional video; and analyzing intensity at a specific location in the further plurality of frames having the background image subtracted, the specific location corresponding to detection of the object in a still further plurality of frames of the two-dimensional video, wherein the detecting the impact frame is based on the analyzing the intensity.
 10. The method of claim 6, further comprising: verifying the time bias through analysis of intensity at a determined object location in frames of the two-dimensional video, to detect false impact frames in the two-dimensional video.
 11. The method of claim 6, further comprising: overlaying a projected golf club swing trajectory onto the two-dimensional video, based on temporal alignment of the inertial measurement data of the golf club swing and the two-dimensional video, using the time bias.
 12. The method of claim 6, wherein the device having the video camera is a mobile device, and further comprising: sending a video having a projected golf club trajectory aligned using the time bias and overlaid on the two-dimensional video of the golf club swing, to the mobile device for display on the mobile device.
 13. A tangible, non-transitory, computer-readable media having instructions thereupon which, when executed by a processor, cause the processor to perform a method comprising: receiving, from an inertial measurement unit (IMU), inertial measurement data of a golf club swing; receiving a two-dimensional video of the golf club swing; determining a location of an object in at least one frame in the two-dimensional video; determining an impact frame, by searching backwards from a later frame to an earlier frame in the two-dimensional video for a frame in which brightness at the determined location relative to the frame exceeds a threshold; aligning the detected impact frame from the two-dimensional video to an impact time from the inertial measurement data of the golf club swing; determining a time bias between the frames of the two-dimensional video and frames of the inertial measurement data of the golf club swing based on the aligning; and overlaying a projected golf club swing trajectory onto the two-dimensional video of the golf club swing, based on temporal alignment, with the time bias, of the inertial measurement data of the golf club swing and the two-dimensional video.
 14. The computer-readable media of claim 13, wherein the receiving or capturing a two-dimensional video comprises: receiving the two-dimensional video from a mobile device via a wireless connection.
 15. The computer-readable media of claim 13, wherein the method further comprises: performing spatial alignment of the inertial measurement data of the golf club swing and the two-dimensional video of the golf club swing, with the time bias for an initial temporal alignment; shifting temporal alignment of the inertial measurement data of the golf club swing and the two-dimensional video forwards or backwards relative to the initial temporal alignment; determining spatial alignment error for each adjustment in the temporal alignment; and determining a temporal alignment having a minimum spatial alignment error, as a refined time bias.
 16. The computer-readable media of claim 13, wherein the method further comprises: determining a maximum intensity at the location of the object in frames of the two-dimensional video; determining a minimum intensity at the location in frames of the two-dimensional video, wherein the minimum intensity occurs in a frame with the object absent from the frame; and determining the threshold between the maximum intensity and the minimum intensity.
 17. The computer-readable media of claim 13, wherein the method further comprises: determining a temporal pattern in frames of the two-dimensional video, with intensity at the location in frames before the impact frame being greater than intensity at the location in frames after the impact frame; and verifying selection of the impact frame, based on the temporal pattern.
 18. The computer-readable media of claim 13, wherein the method further comprises: determining a set of frames of the two-dimensional video in which the object does not appear; determining an average background from the set of frames; and forming a set of background subtracted images, based on subtracting the average background from frames of the two-dimensional video, wherein the determining the impact frame is based on the set of background subtracted images.
 19. The computer-readable media of claim 13, wherein the method further comprises: sending a video having the projected golf swing trajectory aligned, based on the time bias, and overlaid on the two-dimensional video of the golf club swing, to a mobile device for display.
 20. The computer-readable media of claim 13, wherein the method further comprises: analyzing intensity, at the location of the object determined in the at least one frame, in frames of the two-dimensional video; and determining whether the impact frame is a false impact frame, based on the analyzing the intensity. 